Skip to main content

Guide for setting up AWS

This guide will walk you through setting up your AWS account to work with DataStori. 🚀

DataStori runs data pipelines in your environment using AWS Fargate and securely integrates with your account using a cross-account IAM role.


Prerequisites​

Before you begin the setup, please have the following information and resources ready in your AWS account.

📋 Resource Checklist​

  • Networking
    • VPC ID: The ID of the Virtual Private Cloud for running pipelines.
    • Subnet IDs: A list of subnet IDs where the pipelines will run.
    • Security Group IDs: A list of security group IDs to apply to the pipeline containers.
  • Services
    • ECS Cluster: Create a new AWS Fargate ECS cluster and note its ARN.
    • S3 Bucket: The name of the S3 bucket where pipeline data will be stored.
    • S3 Bucket Region: The AWS region where your S3 bucket is located (e.g., us-east-1).
    • RDBMS (Optional): Connection details for any relational database you plan to use.

IAM Configuration Steps​

You'll need to create two IAM roles and two IAM policies to grant DataStori the necessary permissions.

Step 1: Create a log group IAM policy.​

This policy allows Fargate container to create and write logs.

  1. Navigate to IAM -> Policies and click Create policy.
  2. Switch to the JSON tab and paste the following code.
  3. Name the policy "log_group_creation_policy"
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "*"
}
]
}

  1. Click save/create and exit

Step 2: Create the ECS Task Execution Role​

This role allows the Fargate container to pull images and write logs.

  1. Navigate to IAM -> Roles and click Create role.
  2. For the trusted entity, select AWS service, and for the use case, choose Elastic Container Service.
  3. Select the Elastic Container Service Task use case and click Next.
  4. On the permissions page, the AmazonECSTaskExecutionRolePolicy will be attached by default. Also attach the AmazonS3FullAccess policy and the "log_group_creation_policy" created in Step #1 and Click Next.
  5. Name the role datastori-ecs-task-execution-role and click Create role.
  6. Once created, find the role and copy its ARN. You will need this for the next step.

How it works

Step 3: Create the DataStori Management Policy​

This policy defines the specific actions DataStori is allowed to perform, like starting and stopping pipeline tasks.

  1. Navigate to IAM -> Policies and click Create policy.
  2. Switch to the JSON tab and paste the following code.
  3. Important: Replace <YOUR_AWS_ACCOUNT_ID> with your actual 12-digit AWS Account ID.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RunAndInspectTasks",
"Effect": "Allow",
"Action": [
"ecs:RunTask",
"ecs:StopTask",
"ecs:DescribeTasks",
"ecs:ListTasks",
"ecs:DescribeTaskDefinition"
],
"Resource": "*"
},
{
"Sid": "RegisterAndCleanupTaskDefinitions",
"Effect": "Allow",
"Action": [
"ecs:RegisterTaskDefinition",
"ecs:DeregisterTaskDefinition"
],
"Resource": "*"
},
{
"Sid": "PassSingleRoleToECSTasks",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<YOUR_AWS_ACCOUNT_ID>:role/datastori-ecs-task-execution-role",
"Condition": {
"StringEquals": {
"iam:PassedToService": "ecs-tasks.amazonaws.com"
}
}
}
]
}
  1. Click Next, give the policy the name DataStori-ECSTaskManage-Policy, and click Create policy.

Step 4: Create the Cross-Account Role for DataStori​

This final role trusts DataStori's AWS account and uses the policy you just created to grant permissions.

  1. Navigate to IAM -> Roles and click Create role.
  2. For the trusted entity type, select AWS account and choose Another AWS account.
  3. Enter the Account ID for DataStori. (Please ask DataStori customer support for this ID).
  4. Click Next.
  5. On the permissions page, search for and select the DataStori-ECSTaskManage-Policy you created in Step 2.
  6. Click Next.
  7. Name the role datastori-role and click Create role.
  8. Once created, find the role and copy its ARN.

Create role

Add Permissions

Logging (Optional)​

By default, DataStori will write the pipeline logs to CloudWatch. If you want to customize the logging destination, please share the ARN of the CloudWatch log group.


Final Summary​

Please provide the following information to the DataStori team to complete the setup.

  1. Your AWS Account ID: 123456789012
  2. VPC ID: vpc-0123abcd
  3. Subnet IDs: subnet-abcde123, subnet-fghij456
  4. Security Group IDs: sg-5678efgh
  5. S3 Bucket Name: your-datastori-bucket
  6. S3 Bucket Region: us-east-1
  7. ECS Cluster ARN: arn:aws:ecs:region:account-id:cluster/YourClusterName
  8. Task Execution Role ARN: arn:aws:iam::account-id:role/datastori-ecs-task-execution-role
  9. DataStori Cross-Account Role ARN: arn:aws:iam::account-id:role/datastori-role
  10. CloudWatch Log Group ARN (Optional): If you have a preferred logging destination.